suppressMessages(library(scRepertoire))
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'ggalluvial' was built under R version 3.5.2
## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'ggfittext' was built under R version 3.5.2
## Warning: package 'vegan' was built under R version 3.5.2
## Warning: package 'permute' was built under R version 3.5.2
suppressMessages(library(Seurat))
## Warning: package 'Seurat' was built under R version 3.5.2
scRepertoire comes with a data set derived from T cells derived from three patients with renal clear cell carcinoma in order to demonstrate the functionality of the R package. More information on the data set can be found at preprint 1 and preprint 2. The samples consist of paired peripheral-blood and tumor-infiltrating runs, effectively creating 6 distinct runs for T cell receptor (TCR) enrichment. We can preview the elements in the list by using the head function and looking at the first contig annotation. Here notice the barcode is labeled as PX_P_############# - this refers to Patient X (PX) and Peripheral Blood (P).
If you are loading the filtered_contig_annotation.csv file into the R environment to create the list, you will also need to call stringsAsFactors as FALSE, this will prevent the conversion of categorical variables into factors and is necessary for the evaluations built into some of the functions of scRepertoire. The code should look something like this:
csv1 <- read.csv("location/of/file.csv", stringsAsFactors = F)
The object “contig_list” is created from 6 filtered_contig_annotation.csv files from 10x Genomics Cell Ranger. The object was created with contig_list <- list(csv1, csv2, ...).
data("contig_list") #the data built into scRepertoire
head(contig_list[[1]])
## barcode is_cell contig_id high_confidence
## 1 PY_P_AAACCTGAGAGCTGGT TRUE AAACCTGAGAGCTGGT-1_contig_1 TRUE
## 2 PY_P_AAACCTGAGAGCTGGT TRUE AAACCTGAGAGCTGGT-1_contig_2 TRUE
## 3 PY_P_AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_1 TRUE
## 4 PY_P_AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_2 TRUE
## 5 PY_P_AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_5 TRUE
## 6 PY_P_AAACCTGAGTGGTCCC TRUE AAACCTGAGTGGTCCC-1_contig_1 TRUE
## length chain v_gene d_gene j_gene c_gene full_length productive
## 1 705 TRB TRBV20-1 TRBD1 TRBJ1-5 TRBC1 TRUE TRUE
## 2 502 TRB None None TRBJ1-5 TRBC1 FALSE None
## 3 693 TRB TRBV5-1 TRBD2 TRBJ2-2 TRBC2 TRUE TRUE
## 4 567 TRA TRAV12-1 None TRAJ37 TRAC TRUE TRUE
## 5 361 TRB None None TRBJ1-5 TRBC1 FALSE None
## 6 593 TRB TRBV7-9 TRBD1 TRBJ2-5 TRBC2 TRUE TRUE
## cdr3 cdr3_nt reads umis
## 1 CSASMGPVVSNQPQHF TGCAGTGCTAGCATGGGACCGGTAGTGAGCAATCAGCCCCAGCATTTT 16718 6
## 2 None None 6706 3
## 3 CASSWSGAGDGELFF TGCGCCAGCAGCTGGTCAGGAGCGGGAGACGGGGAGCTGTTTTTT 26719 11
## 4 CVVNDEGSSNTGKLIF TGTGTGGTGAACGATGAAGGCTCTAGCAACACAGGCAAACTAATCTTT 18297 6
## 5 None None 882 4
## 6 CASSPSEGGRQETQYF TGTGCCAGCAGCCCCTCCGAAGGGGGGAGACAAGAGACCCAGTACTTC 11218 6
## raw_clonotype_id raw_consensus_id
## 1 clonotype96 clonotype96_consensus_1
## 2 clonotype96 None
## 3 clonotype97 clonotype97_consensus_2
## 4 clonotype97 clonotype97_consensus_1
## 5 clonotype97 None
## 6 clonotype98 clonotype98_consensus_1
Some workflows will have the additional labeling of the standard barcode. Before we proceed, we will use the function stripBarcode() in order to avoid any labeling issues down the line. Importantly, stripBarcode() is for removing prefixes on barcodes that have resulted from other pipeline.
No need for stripBarcode function, if the barcodes look like: + AAACGGGAGATGGCGT-1 + AAACGGGAGATGGCGT
In terms of using stripBarcode(), please think about the following parameters.
for (i in seq_along(contig_list)) {
contig_list[[i]] <- stripBarcode(contig_list[[i]], column = 1, connector = "_", num_connects = 3)
}
head(contig_list[[1]])
## barcode is_cell contig_id high_confidence length
## 1 AAACCTGAGAGCTGGT TRUE AAACCTGAGAGCTGGT-1_contig_1 TRUE 705
## 2 AAACCTGAGAGCTGGT TRUE AAACCTGAGAGCTGGT-1_contig_2 TRUE 502
## 3 AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_1 TRUE 693
## 4 AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_2 TRUE 567
## 5 AAACCTGAGCATCATC TRUE AAACCTGAGCATCATC-1_contig_5 TRUE 361
## 6 AAACCTGAGTGGTCCC TRUE AAACCTGAGTGGTCCC-1_contig_1 TRUE 593
## chain v_gene d_gene j_gene c_gene full_length productive cdr3
## 1 TRB TRBV20-1 TRBD1 TRBJ1-5 TRBC1 TRUE TRUE CSASMGPVVSNQPQHF
## 2 TRB None None TRBJ1-5 TRBC1 FALSE None None
## 3 TRB TRBV5-1 TRBD2 TRBJ2-2 TRBC2 TRUE TRUE CASSWSGAGDGELFF
## 4 TRA TRAV12-1 None TRAJ37 TRAC TRUE TRUE CVVNDEGSSNTGKLIF
## 5 TRB None None TRBJ1-5 TRBC1 FALSE None None
## 6 TRB TRBV7-9 TRBD1 TRBJ2-5 TRBC2 TRUE TRUE CASSPSEGGRQETQYF
## cdr3_nt reads umis raw_clonotype_id
## 1 TGCAGTGCTAGCATGGGACCGGTAGTGAGCAATCAGCCCCAGCATTTT 16718 6 clonotype96
## 2 None 6706 3 clonotype96
## 3 TGCGCCAGCAGCTGGTCAGGAGCGGGAGACGGGGAGCTGTTTTTT 26719 11 clonotype97
## 4 TGTGTGGTGAACGATGAAGGCTCTAGCAACACAGGCAAACTAATCTTT 18297 6 clonotype97
## 5 None 882 4 clonotype97
## 6 TGTGCCAGCAGCCCCTCCGAAGGGGGGAGACAAGAGACCCAGTACTTC 11218 6 clonotype98
## raw_consensus_id
## 1 clonotype96_consensus_1
## 2 None
## 3 clonotype97_consensus_2
## 4 clonotype97_consensus_1
## 5 None
## 6 clonotype98_consensus_1
You can see now the barcode in column 1, we have removed the P## prefixes.
As the output of CellRanger are quantifications of both the TCRA and TCRB chains, the next step is to create a single list object with the TCR gene and CDR3 sequences by cell barcode. This is performed using the combineContig(), where the input is the stripped contig_list. There is also the relabeling of the barcodes by sample and ID information to prevent duplicates.
combined <- combineContigs(contig_list, samples = c("PY", "PY", "PX", "PX", "PZ","PZ"), ID = c("P", "T", "P", "T", "P", "T"), cells ="T-AB")
A basic analysis of the called contigs can be visualized with several functions in scRepertoire. Before visualization though, it’s important to think about how you’d like to call the clonotypes.
What if there are more variables to add than just sample and ID? We can add them by using the addVariable() function. All we need is the name of the variable you’d like to add and the specific character or numeric values (variables). As an example, here we add the batches in which the samples were processed and sequenced.
example <- addVariable(combined, name = "batch", variables = c("b1", "b1", "b2", "b2", "b2", "b2"))
example[[1]][1:5,ncol(example[[1]])] # This is showing the first 5 values of the new column added
## [1] "b1" "b1" "b1" "b1" "b1"
Likewise we can remove specific list elements after combineContig() using the subsetContig() function. In order to subset, we need to identify the vector we would like to use for subsetting (name) and also the variable values to subset (variables). Below you can see us isolate just the 4 sequencing results from PX and PY.
subset <- subsetContig(combined, name = "sample", variables = c("PX", "PY"))
Import to note that the clonotype is called using essentially the combination of genes or nt/aa CDR3 sequences for both loci. As of this implementation of scRepertoire, clonotype calling is not incorporating small variations within the CDR3 sequences. As such the gene approach will be the most sensitive, while the use of nt or aa moderately so, and the most specific for clonotypes being gene+nt. Additionally, the clonotype call is trying to incorporate both loci, i.e, both TCRA and TCRB chains and if a single cell barcode has multiple sequences identified (i.e., 2 TCRA chains expressed in one cell). Using the 10x approach, there is a subset of barcodes that only return one of the immune receptor chains, the unreturned chain is assigned an NA value.
The first function to explore the clonotypes is quantContig() to return the total or relative numbers of unique clonotypes. ####scale + TRUE - relative percent of unique clonotypes scaled by total size of the size of the clonotype repertoire + FALSE - Report the total number of unique clonotypes
quantContig(combined, call="gene+nt", scale = T)
ggsave("Figure2A_1.pdf", height=2, width=4)
## Warning: Removed 6 rows containing missing values (geom_errorbar).
Within each of the general analysis functions, there is the ability to export the data frame used to create the visualization. To get the exported values, use exportTable == T. It will return the data into the global environment labeled as functionName_output.
quantContig(combined, call="gene+nt", scale = T, exportTable = T)
quantContig_output
## contigs values total scaled
## 1 2692 PY_P 3208 83.91521
## 2 1513 PY_T 3119 48.50914
## 3 823 PX_P 1068 77.05993
## 4 928 PX_T 1678 55.30393
## 5 1147 PZ_P 1434 79.98605
## 6 764 PZ_T 2768 27.60116
The other option here is to be able to define the visualization by data classes. Here we used the combineContig() to define the ID variable as part of the naming structure. We can the column to specifically use a column in the data set to organize the visualization.
quantContig(combined, call="gene", column = "ID", scale = T)
ggsave("Figure2A_2.pdf", height=2, width=4)
We can also examine the relative distribution of clonotypes by the abundance. Here abundanceContig() will produce a line graph with total number of clonotypes by the number of instances within the sample or run. Like above, we can also group this by vectors within the contig object using the column variable in the function
plot1 <- abundanceContig(combined, call = "gene", scale = F)
plot2 <- abundanceContig(combined, call = "gene", column = "ID", scale = F)
x <- gridExtra::grid.arrange(plot1, plot2, ncol=1)
ggsave("Figure2B.pdf", x, height=4, width=4)
As you can see the peripheral blood sample derived from patient 1 is a relative extreme outlier. Another method to examine the relative abundance is to look at the density by using the scale call in the function.
abundanceContig(combined, column = "ID", scale = T)
## Warning in if (call == "gene") {: the condition has length > 1 and only the
## first element will be used
ggsave("Figure2C.pdf", height=4, width=2)
Lastly on the basic visualization side, we can look at the length distribution of the CDR3 sequences by calling the lengtheContig() function. Importantly, unlike the other basic visualizations, the call can only be “nt” or “aa”. Due to the method of calling clonotypes as outlined above, the length should reveal a multimodal curve, this is a product of using the NA for the unreturned chain sequence and multiple chains within a single barcode.
lengthContig(combined, call="aa", chains = "combined")
ggsave("Figure2D.pdf", height=2, width=5.5)
Or we can visualize the individual chains of the immune receptors by selecting chains = “single”. Notably this will remove the NA component of combined clonotypes, so visualize is only the sequences recovered in the filtered contig annotation file from Cell Ranger.
lengthContig(combined, call="nt", chains = "single")
We can also look at clonotypes between samples and changes in dynamics by using the compareClonotypes() function.
compareClonotypes(combined, numbers = 10, samples = c("PX_P", "PX_T"), call="aa", graph = "alluvial")
After we have completed the basic processing and summary functions in scRepertoire, we can begin to explore the clonotypes of the single-cell data in more detail.
By examining the clonal space, we are effectively looking at the relative space occupied by clones at specific proportions. Another way to think about this would be thinking of the total immune receptor sequencing run as a measuring cup. In this cup, we will fill liquids of different viscosity - or different number of clonal proportions. Clonal space homeostasis is asking what percentage of the cup is filled by clones in distinct proportions (or liquids of different viscosity, to extend the analogy). The proportional cutpoints are set under the cloneType variable in the function and can be adjusted, at baseline the bins are as follows:
plot1 <- clonalHomeostasis(combined, call = "gene")
plot2 <- clonalHomeostasis(combined, call = "aa")
x <- gridExtra::grid.arrange(plot1, plot2, ncol=1)
ggsave("Figure3A.pdf", x, height=4, width=6)
Like clonal space homeostasis above, clonal proportion acts to place clones into separate bins. The key difference is instead of looking at the relative proportion of the clone to the total, the clonalProportion() function will rank the clones by total number and place them into bins.
The split represents ranking of clonotypes by copy or frequency of occurrence, meaning 1:10 are the top 10 clonotypes in each sample. The default bins are under the split variable in the function and can be adjusted, but at baseline they are as follows.
plot1 <- clonalProportion(combined, call = "gene")
plot2 <- clonalProportion(combined, call = "nt")
x <- gridExtra::grid.arrange(plot1, plot2, ncol=1)
ggsave("Figure3B.pdf", x, height=4, width=6)
If you are interested in measures of similarity between the samples loaded into scRepertoire, using clonalOverlap() can assist in the visualization. Two methods currently can be performed in clonalOverlap() 1) overlap coefficient and 2) Morisita index. The former is looking at the overlap of clonotypes scaled to the length of unique clonotypes in the smaller sample. The Morisita index is more complex, it is an ecological measure of the dispersion of individuals within a population, incorporating the size of the population.
clonalOverlap(combined, call = "gene+nt", method = "morisita")
ggsave("Figure3C.pdf", height = 2.5, width = 4.5)
## Warning: Removed 15 rows containing missing values (geom_text).
Diversity can also be measure for samples or by other variables. Diversity is calculated using four metrics: 1) Shannon, 2) inverse Simpson, 3) Chao1, and 4) Abundance-based Coverage Estimator (ACE). With the former two generally used to estimate baseline diversity and Chao/ACE indices used to estimate the richness of the samples.
clonalDiversity(combined, call = "gene", colorBy = "samples")
clonalDiversity(combined, call = "gene", colorBy = "ID")
ggsave("Figure3D.eps", height=2, width=5)
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-transparency is
## not supported on this device: reported only once per page
As mentioned previously, this data set is derived from work performed in the laboratory of Weizhou Zhang. We have elected to pair the workflow of scRepertoire with the excellent Seurat package, for greater usability. The first step is to load the Seurat object and visualize the data.
seurat <- get(load("/Users/nick/seurat2.rda"))
DimPlot(seurat, label = T) + NoLegend()
## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.
ggsave("Figure4A.eps", height=3, width=3.5)
Here you can see we have 12 total clusters (C1-12), which we have labeled as such for simplicity. We can also get a little more granular information on the number of cells by using the table() function.
table(seurat@active.ident)
##
## C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
## 2293 2138 1746 1419 1167 1128 807 792 495 357 328 241
Next we can take the clonotypic information and attach it to our Seurat object using the combineSeurat() function. Importantly, the major requirement for the attachment is matching contig cell barcodes and barcodes in the row names of the seurat@meta.data. If these do not match, the attachment will fail. Based on ease, we suggest you make the changes to the Seurat object row names. scRepertoire also has a function changeNames that can be used to replace specific strings with ones that match the barcodes in combined sequences. The function uses the *gsub, so it will replace every occurrence of one string with the new string in place.
We can call the 4 variations of clonotypes: 1) genes, 2) CDR3 amino acid sequence, 3) CDR3 nucleotide sequence, or 4) genes and CDR3 nucleotide sequence. The attaching function will also calculate the frequency of the clonotype based on the groupBy variable. If blank, groupBy will calculate frequencies of clonotypes by individual run, but because we have 6 samples of paired peripheral and tumor T cells, we are actually going to use the groupBy variable to call “sample” in order to calculate frequencies across both the peripheral blood and tumor T cells of the same patient.
Lastly, in order to categorize the frequency, we have the variable cloneTypes which acts as a bin to place labels. As a default, cloneTypes is set to equal c(Single = 1, Small = 5, Medium = 20, Large = 100, Hyperexpanded = 500). This is because the highest repeated clonotype is in Patient 3 with just under 500 clones. If your data has a clone with greater expansion, you should readjust the cutpoints.
seurat <- combineSeurat(combined, seurat, call="gene", groupBy = "sample")
We first want to look at the distribution of peripheral versus tumor T cells. We can use the same color scheme as the rest of the scRepertoire package by calling the object colorblind_vector using the following hex codes.
colorblind_vector <- colorRampPalette(c("#FF4B20", "#FFB433", "#C6FDEC", "#7AC5FF", "#0348A6"))
DimPlot(seurat, group.by = "Type") + NoLegend() +
scale_color_manual(values=colorblind_vector(2))
ggsave("Figure4B_1.eps", height=3, width=3.5)
We can also look at the composition of each cluster by comparing the proportion of the cluster comprised of peripheral blood versus tumor T cells. We can do this by first forming a table of the cluster and type of cells, then scaling the rows of the table by the total number of cells sequenced.
table <- table(seurat$Type, seurat@active.ident)
table[1,] <- table[1,]/sum(table[1,]) #Scaling by the total number of peripheral T cells
table[2,] <- table[2,]/sum(table[2,]) #Scaling by the total number of tumor T cells
table <- as.data.frame(table)
table$Var2 <- factor(table$Var2, levels = c("C1", "C2", "C3", "C4", "C5", "C6", "C7", "C8", "C9", "C10", "C11", "C12"))
ggplot(table, aes(x=Var2, y=Freq, fill=Var1)) +
geom_bar(stat="identity", position="fill", color="black", lwd=0.25) +
theme(axis.title.x = element_blank()) +
scale_fill_manual(values = c("#FF4B20","#0348A6")) +
theme_classic() +
theme(axis.title = element_blank()) +
guides(fill=F)
ggsave("Figure4B_2.pdf", height=2, width=2.5)
Now we can look at the distribution of the clonotype bins by first ordering the clonoType as a factor, this prevents the coloring from being in alphabetical order. Next we use the DimPlot() function call in Seurat with our scale_color_manual additional layer.
seurat@meta.data$cloneType <- factor(seurat@meta.data$cloneType, levels = c("Hyperexpanded (100 < X <= 500)", "Large (20 < X <= 100)", "Medium (5 < X <= 20)", "Small (1 < X <= 5)", "Single (0 < X <= 1)", NA))
DimPlot(seurat, group.by = "cloneType") +
scale_color_manual(values = c(rev(colorblind_vector(5))), na.value="grey")
ggsave("Figure4C.eps", height=3, width=6.5)
We can also use the combineSeurat function to take a look at the clonotypic frequency by cluster.
meta <- data.frame(seurat@meta.data, seurat@active.ident)
ggplot(meta, aes(x=seurat.active.ident, y=Frequency)) +
geom_boxplot(outlier.alpha = 0, aes(fill=seurat.active.ident)) +
guides(fill=F) +
theme_classic() +
theme(axis.title.x = element_blank())
## Warning: Removed 2320 rows containing non-finite values (stat_boxplot).
ggsave("Figure4D.pdf", height=2, width = 3)
## Warning: Removed 2320 rows containing non-finite values (stat_boxplot).
We can also look at the clonotypes by calling specific sequences in the highlightClonotypest() below. In order to highlight the clonotypes, we first need to use the call the type of sequence we will be using and then the specific sequences themselves using sequence. Below you can see the steps to highlight the two most prominent sequences “CAVNGGSQGNLIF_CSAEREDTDTQYF” with a frequency = 482 (clonotype 1) and “NA_CATSATLRVVAEKLFF” with a frequency = 287 (Clonotype2).
seurat <- highlightClonotypes(seurat, call= "aa", sequence = c("CAVNGGSQGNLIF_CSAEREDTDTQYF", "NA_CATSATLRVVAEKLFF"))
DimPlot(seurat, group.by = "highlight")
ggsave("Figure4E.eps", height=3, width=4.75)
Lastly after all the metadata has been modified, we can look at clonotypes across multiple categories using the alluvialClonotypes() function. To understand the basic concepts of this graphing method, I’d highly recommend reading this post, essentially we are able to use the plots to examine the interchange of categorical variables. Because this function will produce a graph with each clonotype arranged by called stratifications, this will take some time depending on the size of your total cells.
alluvialClonotypes(seurat, call = "gene", compare = "cluster", facet = "Patient")
## Warning in if (is.na(meta[, compare])) {: the condition has length > 1 and only
## the first element will be used
ggsave("Figure4F.pdf", height=2, width=4)
## Warning in f(..., self = self): Aesthetic `label` is specified, so parameter
## `infer.label` will be ignored.
## Warning in f(..., self = self): Aesthetic `label` is specified, so parameter
## `infer.label` will be ignored.
## Warning in f(..., self = self): Aesthetic `label` is specified, so parameter
## `infer.label` will be ignored.
## Warning: Removed 2 rows containing missing values (geom_fit_text).
After adding the clonotype information to the Seurat object, we can also look at clonotypic differences between clusters using some of the previous functions
clonalDiversity(seurat, call = "nt", colorBy = "cluster")
clonalHomeostasis(seurat, call = "nt")
clonalProportion(seurat, call = "nt")
clonalOverlap(seurat, call="aa", method="overlap")
For users that would like greater ability to use the meta data in the Seurat objects to perform the analysis that scRepertoire provides, there is also the option of using the seurat2List() function that will take the meta data and output the data as a list by cluster.
combined2 <- seurat2List(seurat)
head(combined2[[1]])
## nCount_RNA nFeature_RNA integrated_snn_res.0.5
## PY_T_AAACCTGAGTAGCCGA 5726 1969 0
## PY_T_AAACCTGAGTGGTAAT 4413 1444 0
## PY_T_AAACGGGAGCTAACTC 3214 1213 0
## PY_T_AAACGGGCAGTCGATT 3959 1220 0
## PY_T_AAACGGGCATTAGGCT 3231 1208 0
## PY_T_AAAGATGGTTATCCGA 4296 1459 0
## seurat_clusters Patient Type RawBarcode
## PY_T_AAACCTGAGTAGCCGA 0 PY T AAACCTGAGTAGCCGA
## PY_T_AAACCTGAGTGGTAAT 0 PY T AAACCTGAGTGGTAAT
## PY_T_AAACGGGAGCTAACTC 0 PY T AAACGGGAGCTAACTC
## PY_T_AAACGGGCAGTCGATT 0 PY T AAACGGGCAGTCGATT
## PY_T_AAACGGGCATTAGGCT 0 PY T AAACGGGCATTAGGCT
## PY_T_AAAGATGGTTATCCGA 0 PY T AAAGATGGTTATCCGA
## barcode
## PY_T_AAACCTGAGTAGCCGA PY_T_AAACCTGAGTAGCCGA
## PY_T_AAACCTGAGTGGTAAT PY_T_AAACCTGAGTGGTAAT
## PY_T_AAACGGGAGCTAACTC PY_T_AAACGGGAGCTAACTC
## PY_T_AAACGGGCAGTCGATT PY_T_AAACGGGCAGTCGATT
## PY_T_AAACGGGCATTAGGCT PY_T_AAACGGGCATTAGGCT
## PY_T_AAAGATGGTTATCCGA PY_T_AAAGATGGTTATCCGA
## CTgene
## PY_T_AAACCTGAGTAGCCGA TRAV3.TRAJ31.TRAC_TRBV6-3.TRBJ1-1.TRBD2.TRBC1
## PY_T_AAACCTGAGTGGTAAT TRAV17.TRAJ8.TRAC_NA
## PY_T_AAACGGGAGCTAACTC NA_TRBV6-5.TRBJ2-2.TRBD1.TRBC2
## PY_T_AAACGGGCAGTCGATT TRAV1-2.TRAJ33.TRAC_TRBV20-1.TRBJ2-1.TRBD2.TRBC2
## PY_T_AAACGGGCATTAGGCT TRAV12-2.TRAJ57.TRAC_TRBV7-2.TRBJ1-2.TRBD1.TRBC1
## PY_T_AAAGATGGTTATCCGA TRAV29DV5.TRAJ23.TRAC_TRBV2.TRBJ2-3.TRBD2.TRBC2
## CTnt
## PY_T_AAACCTGAGTAGCCGA TGTGCTGCACACAATGCCAGACTCATGTTT_TGTGCCAGCAGTAAAACAGGACTCAACACTGAAGCTTTCTTT
## PY_T_AAACCTGAGTGGTAAT TGTGCTGTCTCCGGGGGCTTTCAGAAACTTGTATTT_NA
## PY_T_AAACGGGAGCTAACTC NA_TGTGCCAGCAGTTACTCGAAATCAGGGTTCGGGGAGCTGTTTTTT
## PY_T_AAACGGGCAGTCGATT TGTGCTGGCATGGATAGCAACTATCAGTTAATCTGG_TGCAGTGCCCCGCGGGGGGGGAGGGATTACAATGAGCAGTTCTTC
## PY_T_AAACGGGCATTAGGCT TGTGCCGTGAACATCCCTCAGGGCGGATCTGAAAAGCTGGTCTTT_TGTGCCAGCAGCTCTAGGGAAAGAGCTAACTATGGCTACACCTTC
## PY_T_AAAGATGGTTATCCGA TGTGCAGCAAGCGCGCGTAACCAGGGAGGAAAGCTTATCTTC_TGTGCCAGCAGTGAAGAGGCTAGGGCAGGCGATACGCAGTATTTT
## CTaa
## PY_T_AAACCTGAGTAGCCGA CAAHNARLMF_CASSKTGLNTEAFF
## PY_T_AAACCTGAGTGGTAAT CAVSGGFQKLVF_NA
## PY_T_AAACGGGAGCTAACTC NA_CASSYSKSGFGELFF
## PY_T_AAACGGGCAGTCGATT CAGMDSNYQLIW_CSAPRGGRDYNEQFF
## PY_T_AAACGGGCATTAGGCT CAVNIPQGGSEKLVF_CASSSRERANYGYTF
## PY_T_AAAGATGGTTATCCGA CAASARNQGGKLIF_CASSEEARAGDTQYF
## CTstrict
## PY_T_AAACCTGAGTAGCCGA TRAV3.TRAJ31.TRAC_TGTGCTGCACACAATGCCAGACTCATGTTT_TRBV6-3.TRBJ1-1.TRBD2.TRBC1_TGTGCCAGCAGTAAAACAGGACTCAACACTGAAGCTTTCTTT
## PY_T_AAACCTGAGTGGTAAT TRAV17.TRAJ8.TRAC_TGTGCTGTCTCCGGGGGCTTTCAGAAACTTGTATTT_NA_NA
## PY_T_AAACGGGAGCTAACTC NA_NA_TRBV6-5.TRBJ2-2.TRBD1.TRBC2_TGTGCCAGCAGTTACTCGAAATCAGGGTTCGGGGAGCTGTTTTTT
## PY_T_AAACGGGCAGTCGATT TRAV1-2.TRAJ33.TRAC_TGTGCTGGCATGGATAGCAACTATCAGTTAATCTGG_TRBV20-1.TRBJ2-1.TRBD2.TRBC2_TGCAGTGCCCCGCGGGGGGGGAGGGATTACAATGAGCAGTTCTTC
## PY_T_AAACGGGCATTAGGCT TRAV12-2.TRAJ57.TRAC_TGTGCCGTGAACATCCCTCAGGGCGGATCTGAAAAGCTGGTCTTT_TRBV7-2.TRBJ1-2.TRBD1.TRBC1_TGTGCCAGCAGCTCTAGGGAAAGAGCTAACTATGGCTACACCTTC
## PY_T_AAAGATGGTTATCCGA TRAV29DV5.TRAJ23.TRAC_TGTGCAGCAAGCGCGCGTAACCAGGGAGGAAAGCTTATCTTC_TRBV2.TRBJ2-3.TRBD2.TRBC2_TGTGCCAGCAGTGAAGAGGCTAGGGCAGGCGATACGCAGTATTTT
## Frequency cloneType highlight cluster
## PY_T_AAACCTGAGTAGCCGA 1 Single (0 < X <= 1) <NA> C1
## PY_T_AAACCTGAGTGGTAAT 1 Single (0 < X <= 1) <NA> C1
## PY_T_AAACGGGAGCTAACTC 4 Small (1 < X <= 5) <NA> C1
## PY_T_AAACGGGCAGTCGATT 1 Single (0 < X <= 1) <NA> C1
## PY_T_AAACGGGCATTAGGCT 1 Single (0 < X <= 1) <NA> C1
## PY_T_AAAGATGGTTATCCGA 2 Small (1 < X <= 5) <NA> C1
This has been a general overview of the capabilities for scRepertoire from the initial processing and visualization to attach to the mRNA expression values in a Seurat object. If you have any questions, comments or suggestions, feel free to visit the github repository or email me.
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.15.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Seurat_3.1.2 scRepertoire_0.0.1 vegan_2.5-6 lattice_0.20-38
## [5] permute_0.9-5 reshape2_1.4.3 RColorBrewer_1.1-2 ggfittext_0.8.1
## [9] ggalluvial_0.11.1 ggplot2_3.2.1 dplyr_0.8.3 colorRamps_2.3
##
## loaded via a namespace (and not attached):
## [1] TH.data_1.0-10 Rtsne_0.15 colorspace_1.4-1
## [4] ellipsis_0.3.0 ggridges_0.5.1 farver_2.0.1
## [7] leiden_0.3.1 listenv_0.8.0 npsurv_0.4-0
## [10] ggrepel_0.8.1 mvtnorm_1.0-11 codetools_0.2-16
## [13] splines_3.5.1 R.methodsS3_1.7.1 mnormt_1.5-5
## [16] lsei_1.2-0 knitr_1.26 TFisher_0.2.0
## [19] zeallot_0.1.0 jsonlite_1.6 ica_1.0-2
## [22] cluster_2.1.0 png_0.1-7 R.oo_1.23.0
## [25] uwot_0.1.5 sctransform_0.2.1 compiler_3.5.1
## [28] httr_1.4.1 backports_1.1.5 assertthat_0.2.1
## [31] Matrix_1.2-18 lazyeval_0.2.2 htmltools_0.4.0
## [34] tools_3.5.1 rsvd_1.0.2 igraph_1.2.4.2
## [37] gtable_0.3.0 glue_1.3.1 RANN_2.6.1
## [40] Rcpp_1.0.3 Biobase_2.42.0 vctrs_0.2.1
## [43] multtest_2.38.0 gdata_2.18.0 ape_5.3
## [46] nlme_3.1-143 gbRd_0.4-11 lmtest_0.9-37
## [49] xfun_0.11 stringr_1.4.0 globals_0.12.5
## [52] lifecycle_0.1.0 irlba_2.3.3 gtools_3.8.1
## [55] future_1.15.1 MASS_7.3-51.5 zoo_1.8-6
## [58] scales_1.1.0 parallel_3.5.1 sandwich_2.5-1
## [61] yaml_2.2.0 gridExtra_2.3 reticulate_1.14
## [64] pbapply_1.4-2 stringi_1.4.5 mutoss_0.1-12
## [67] plotrix_3.7-7 caTools_1.17.1.3 BiocGenerics_0.28.0
## [70] bibtex_0.4.2.1 Rdpack_0.11-1 SDMTools_1.1-221.2
## [73] rlang_0.4.2 pkgconfig_2.0.3 bitops_1.0-6
## [76] evaluate_0.14 ROCR_1.0-7 purrr_0.3.3
## [79] labeling_0.3 htmlwidgets_1.5.1 cowplot_1.0.0
## [82] tidyselect_0.2.5 RcppAnnoy_0.0.14 plyr_1.8.5
## [85] magrittr_1.5 R6_2.4.1 gplots_3.0.1.1
## [88] multcomp_1.4-11 pillar_1.4.3 withr_2.1.2
## [91] mgcv_1.8-31 sn_1.5-4 fitdistrplus_1.0-14
## [94] survival_3.1-8 tsne_0.1-3 tibble_2.1.3
## [97] future.apply_1.3.0 crayon_1.3.4 KernSmooth_2.23-16
## [100] plotly_4.9.1 rmarkdown_2.0 grid_3.5.1
## [103] data.table_1.12.8 metap_1.2 digest_0.6.23
## [106] tidyr_1.0.0 numDeriv_2016.8-1.1 R.utils_2.9.2
## [109] RcppParallel_4.4.4 stats4_3.5.1 munsell_0.5.0
## [112] viridisLite_0.3.0